Computer method and apparatus for constraining a non-linear approximator of an empirical process

ABSTRACT

A constrained non-linear approximator for empirical process control is disclosed. The approximator constrains the behavior of the derivative of a subject empirical model without adversely affecting the ability of the model to represent generic non-linear relationships. There are three stages to developing the constrained non-linear approximator. The first stage is the specification of the general shape of the gain trajectory or base non-linear function which is specified graphically, algebraically or generically and is used as the basis for transfer functions used in the second stage. The second stage of the invention is the interconnection of the transfer functions to allow non-linear approximation. The final stage of the invention is the constrained optimization of the model coefficients such that the general shape of the input/output mappings (and their corresponding derivatives) are conserved.

RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application No.60/214,875, filed on Jun. 29, 2000. The entire teachings of the aboveapplication are incorporated herein by reference.

BACKGROUND OF THE INVENTION

It has been a customary practice for many years to utilize universalapproximators such as neural networks when attempting to model complexnon-linear, multi-variable functions. Industrial application of suchtechnologies has been particularly prevalent in the area of inferentialor soft sensor predictors. For example, see Neuroth, M., MacConnell, P.,Stronach, F., Vamplew, P. (April 2000): “Improved modeling and controlof oil and gas transport facility operations using artificialintelligence.”, Knowledge Based Systems, vol. 13, no. 2, pp. 81-9; andMolga, E. J. van Woezik, B. A. A, Westerterp, K. R.: “Neural networksfor modeling of chemical reaction systems with complex kinetics:oxidation of 2-octanol with nitric acid”, Chemical Engineering andProcessing, July 2000, vol. 39, no. 4, pp. 323-334. Many industrialprocesses require quality control of properties that are still expensiveif not impossible to measure on-line. Inferential quality estimatorshave been utilized to predict such qualities from easy to measureprocess variables, such as temperatures, pressures, etc. Often, thecomplex interactions within a process (particularly in polymerprocesses) manifest as complex non-linear relationships between the easyto measure variables and the complex quality parameters.

Historically, conventional neural networks (or other generic non-linearapproximators) have been used to represent these complexnon-linearities. For example, see Zhang, J., Morris, A. J., Martin, E.B., Kiparissides, C.: “Estimation of impurity and fouling in batchpolymerization reactors through application of neural networks”,Computers in Chemical Engineering, Feb. 1999, vol. 23, no. 3, pp.301-314; and Huafang, N., Hunkeler, D.: “Prediction of copolymercomposition drift using artificial neural networks: copolymerization ofacrylamide with quaternary ammonium cationic monomers”, Polymer,February 1997, vol. 38, no. 3, pp. 667-675. Historical plant data isused to train the models (i.e., determine the model coefficients), andthe objective function for a model is set so as to minimize model erroron some arbitrary (but representative) training data set. The algorithmsused to train these models focus on model error. Little or no attentionis paid to the accuracy of the derivative of the converged function.

This focus on model error (without other considerations) prohibits theuse of such paradigms (i.e., conventional neural networks) in closedloop control schemes since the objective of a non-linear model isusually to schedule the gain and lag of the controller. Althoughjacketing can be used to restrict the models from working in regions ofone dimensional extrapolation, the models will be expected tointerpolate between operating points. A linear or well behavednon-linear interpolation is therefore required. The gains may not matchthe actual process exactly but at the very least, the trajectory shouldbe monotonically sympathetic to the general changes in the process gainwhen moving from one operating point to another.

Work has been undertaken to understand the stability of dynamicconventional neural networks in closed loop control schemes. Kulawski etal. have recently presented an adaptive control technique for non-linearstable plants with unmeasurable states (see Kulawski, G. J., Brydys', M.A.: “Stable adaptive control with recurrent networks”, Automatica, 2000,vol. 36, pp. 5-22). The controller takes the form of a non-lineardynamic model used to compute a feedback linearizing controller. Thestability of the scheme is shown theoretically. The Kulawski et al.paper emphasizes the importance of monotonic activation functions in theoverall stability of the controller. However, the argument is notextended to the case of inappropriate gain estimation in areas of datasparseness.

Universal approximators (e.g., conventional neural networks) cannotguarantee that the derivatives will be well behaved when interpolatingbetween two points. The very nature of these models means that anyresult could occur in the prediction of the output by the universalapproximator in a region of missing or sparse data between two regionsof sufficient data. Provided that the final two points on the trajectoryfit, then the path between the points is unimportant. One of the keyadvantages of the present invention is that it uses a priori knowledgeof the process gain trajectory (e.g., monotonic gain, bounded gain,etc.) and constrains the estimator to solutions that possess theseproperties.

The benefits of including a priori knowledge in the construction ofnon-linear approximators has been cited in many areas. Lindskog et al.discuss the monotonic constraining of fuzzy model structures and appliessuch an approach to the control of a water heating system (see Lindskog,P, Ljung, L.: “Ensuring monotonic gain characteristics in estimatedmodels by fuzzy model structures”, Automatica, 2000, vol. 36, pp.311-317). Yaser, S. Abu-Mostafa discusses one method of “tempting” aneural network to have localized monotonic characteristics by“inventing” pseudo-training data that possesses the desired non-linearcharacteristics (see Yaser, S. Abu-Mostafa: “Machines that learn fromhints”, Scientific American, April 1995, pp. 64 -69). This does notguarantee global adherence to this particular input/output relationship.

Thus, it is well accepted that universal approximators should not beused in extrapolating regions of data. Since they are capable ofmodeling any non-linearity then any result could occur in regionsoutside and including the limits of the training data range.

For process control, the constraining of the behavior of an empiricalnon-linear model (within its input domain) is essential for successfulexploitation of non-linear advanced control. Universal approximators,such as conventional neural networks cannot be used in advanced controlschemes for gain scheduling without seriously deteriorating thepotential control performance.

SUMMARY OF THE INVENTION

The present invention is an alternative that allows the gain trajectoryand monotonicity of the non-linear empirical approximator to becontrolled. Although not a universal approximator, the ability of theinvention to “fit” well behaved functions is competitive withconventional neural networks yet without any of the instabilities thatsuch an approach incurs. The main feature of the invention is toconstrain the behavior of the derivative of the empirical model withoutadversely affecting the ability of the model to represent genericnon-linear relationships.

The constrained non-linear approximators described in this inventionaddress the issue of inappropriate gains in areas of data sparseness(e.g., in the training data) and provides a non-linear approximatingenvironment with well behaved derivatives. The general shape of the gaintrajectory is specified if required. Alternatively, the trajectory is“learned” during training and later investigated. The key to the presentinvention is that the constrained behavior of the model derivative isguaranteed across the entire input domain of the model (i.e., the wholerange of possible values acceptable as input to the model)—not just thetraining data region. Thus, the present invention does guarantee aglobal adherence to the gain trajectory constraints.

One approach that attempts to constrain conventional feedforward neuralnetworks using gain-constrained training is described in Erik Hartmann.“Training Feedforward Neural Networks with Gain Constraints,” in NeuralComputation, 12, 811-829 (2000). In this approach, constraints are setfor each input/output for a model having multiple inputs and outputs.The approach of Hartmann does not guarantee that the global behavior ofthe model will have a constrained global behavior (e.g., across theentire model input domain). In contrast, the approach of the inventioninsures that the model has a constrained global behavior, as describedin more detail herein.

In the preferred embodiment, there are three stages in developing aconstrained non-linear approximator for an empirical process. The firststage is the specification of the general shape of the gain trajectory,which results in an initial model of the empirical process. This may bespecified graphically, algebraically or generically (learned by theoptimizer). The second stage of the invention is the interconnection oftransfer (e.g., activation) functions, which allow non-linearapproximation in a non-linear network model based on the initial model.The final stage of the invention is the constrained optimization of themodel coefficients in an optimized model (i.e., constrained non-linearapproximator) based on the non-linear network model, such that thegeneral shape of the input/output mappings (and their correspondingderivatives) are conserved.

These three stages described above form the modeling part of theinvention that utilizes the constraining algorithm for generatingnon-linear (dynamic or steady state) models that possess the desiredgain trajectory. The techniques of the invention allow the user (i.e.,model designer) to interrogate both the input/output and gain trajectoryat random or specific points in the input data domain.

With the model (e.g., optimized non-linear model) built, the user maybuild a non-linear controller. The controller utilizes the optimizedmodel in its prediction of the optimal trajectory to steady state (e.g.,optimal gain trajectory of the desired output to reach a steady stateprocess to produce the desired output). An accurate, non-linearprediction of the controlled variables and the process gains areavailable from the non-linear optimized model.

In another embodiment of the invention, the invention also allowsfurther modeling (of either raw empirical or empirical/first principleshybrid or alternative hybrid structure) utilizing the gain trajectoryconstraining algorithm to generate a non-linear model of the process forfurther process optimization purposes (e.g., non-linear program) ineither the interconnection stage or the constrained optimization stage(or both stages). The optimizer then uses this constrained model toidentify optimal set points for the non-linear controller.

The invention may be used to model any form of an empirical process toproduce a constrained non-linear approximator, where a prior knowledgeof underlying system behavior is used to define a constraint on theoptimization of the interconnected model of transfer functions (e.g.,non-linear network model based on a layered architecture). For example,the techniques of the invention may be applied to, but are not limitedto, any chemical or process model, financial forecasting, patternrecognition, retail modeling and batch process modeling.

Thus, the present invention provides a method and apparatus for modelinga non-linear empirical process. In particular, the present inventionprovides a computer apparatus including a model creator, a modelconstructor and an optimizer. The model creator creates an initial modelgenerally corresponding to the non-linear empirical process to bemodeled. The initial model has an initial input and an initial output.The initial model corresponds generally to the shape of the input/outputmapping for the empirical process. Coupled to the model creator is amodel constructor for constructing a non-linear network model based onthe initial model. The non-linear network model has multiple inputsbased on the initial input and a global behavior for the non-linearnetwork model as a whole that conforms generally to the initial output.Coupled to the model constructor is an optimizer for optimizing thenon-linear network model based on empirical inputs to produce anoptimized model by constraining the global behavior of the non-linearnetwork model. The optimized model provides one example of theconstrained non-linear approximator. The resulting optimized model thusprovides a global output that conforms to the general shape of theinput/output mapping of the initial model, while being constrained sothat the global output of the optimized model produces consistentresults (e.g., monotonically increasing results) for the whole range ofthe input domain. The modeling apparatus and method described herein isapplicable to any non-linear process.

In accord with another aspect of the invention, the model creatorspecifies a general shape of a gain trajectory for the non-linearempirical process. The resulting optimized model thus provides a globaloutput that conforms to the general shape of the gain trajectoryspecified for the initial model.

In another aspect of the invention, the model creator specifies anon-linear transfer function suitable for use in approximating thenon-linear empirical process. The non-linear network may includeinterconnected processing elements, and the model constructorincorporates the non-linear transfer function into at least oneprocessing element. The optimizer may set constraints by taking abounded derivative of the non-linear transfer function. In a preferredembodiment, the non-linear transfer function includes the log of ahyperbolic cosine function.

In another aspect of the invention, the model constructor constructs thenon-linear network model based on a layered network architecture havinga feedforward network of nodes with input/output relationships to eachother. The feedforward network includes transformation elements. Eachtransformation element has a non-linear transfer function, a weightedinput coefficient and a weighted output coefficient. In this aspect, theoptimizer constrains the global behavior of the non-linear network modelto a monotonic transformation based on the initial input by pairing theweighted input and output coefficients for each transformation elementin a complementary manner to provide the monotonic transformation. Thecomplementary approach is also referred to as “complementarity pairing.”Using this approach, the optimizer insures that the global output of theoptimized model is constrained to be, for example, monotonicallyincreasing throughout the global output of the optimized model, and overthe entire range of input values.

In a further aspect of the invention, the apparatus and method includesan advisory model that represents another model of the non-linearempirical process that is different from the initial model, thenon-linear network model, and the optimized model. The optimizer mayadjust the optimization of the optimized model based on informationprovided by the advisory model. The advisory model may be a firstprinciples model of the non-linear empirical process. Thus, data from afirst principles approach may be used to inform and influence theoptimization process performed by the optimizer.

The non-linear empirical process may also be part of a greater processmanaged by a controller coupled to the optimizer. In this case, theoptimizer communicates the optimized model to the controller fordeployment in the controller. Thus the optimized model may be includedas one component in some larger model that may use other modelingapproaches for other components of the larger model.

The computer apparatus and method described herein thus provide moreprecise control (or prediction) of the empirical process and a reductionin variance of the output, because the constrained non-linearapproximator (e.g., optimized model) provides more consistent andpredictable output than traditional universal approximators.

In another aspect, the present invention provides a computer apparatusand method for modeling an industrial process. In particular, a computerapparatus and for modeling a polymer process includes a model creator, amodel constructor, and an optimizer. The model creator specifies a basenon-linear function for an initial model generally corresponding to thepolymer process to be modeled. The initial model includes an initialinput and an initial output. The base non-linear function includes a logof a hyperbolic cosine function. Coupled to the model creator is themodel constructor for constructing a non-linear network model based onthe initial model. The non-linear network model includes the basenon-linear function, and has multiple inputs based on the initial input.The global behavior for the non-linear network model as a whole conformsgenerally to the initial output. Coupled to the model constructor is anoptimizer for optimizing the non-linear network model based on empiricalinputs to produce an optimized model by constraining the global behaviorof the non-linear network model by setting constraints based on taking abounded derivative of the base non-linear function.

With the inclusion of a suitable function (e.g., the log of a hyperboliccosine function) the non-linear network model and optimizer use abounded derivative based on this function to set the constraints for theconstrained non-linear approximator (e.g., optimized model). Theresulting output global behavior is constrained in a manner generallyconforming to the expected behavior for a polymer process throughout theentire input domain of inputs values for the polymer process, withoutthe unpredictable behavior that may occur with universal approximatorsbased on traditional neural network approaches. The apparatus and methodof the invention provide a more precise control of a known or ongoingpolymer process in an industrial facility, as well as providing morereliable control for a new polymer (or other chemical) product beingintroduced to the industrial facility. Furthermore, a transfer of apolymer process based on a constrained non-linear approximator may bemore easily made to a similar industrial facility than a transfer basedon polymer process models produced by conventional modeling techniques.

In general, the greater consistency and control of the constrainednon-linear approximator insures a more predictable result for the globalbehavior of the model for any empirical process being modeled.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features and advantages of theinvention will be apparent from the following more particulardescription of preferred embodiments of the invention, as illustrated inthe accompanying drawings in which like reference characters refer tothe same parts throughout the different views. The drawings are notnecessarily to scale, emphasis instead being placed upon illustratingthe principles of the invention.

FIG. 1 is a block diagram of a computer implementation of a preferredembodiment of the present invention.

FIG. 2 is a diagram of the stages of developing a constrained non-linearapproximator in the preferred embodiment.

FIG. 3 is an example of a constrained non-linear approximatorarchitectural specification.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a block diagram of a preferred embodiment of the presentinvention method and apparatus as implemented in a digital processor 22.The illustrated computer apparatus 20 (and method) for constraining anon-linear approximator to model an empirical process is implemented ona digital processor 22, which hosts and executes a modeling module 24and a controller 26 in working memory, such as RAM (random accessmemory). The modeling module 24 includes an initial model creator 34, amodel constructor 36, and an optimizer 38. The components of thecomputer system 20 (e.g., controller 26, initial model creator 34, modelconstructor 36 and optimizer 38) are implemented on the digitalprocessor 22, as shown in FIG. 1, or, in alternate embodiments,implemented in any combination on two or more digital processors incommunication with each other in a distributed computing arrangement. Inaddition, the components 34, 36, and 38 may be implemented in an onlineenvironment where the controller 26 and/or other components 34, 36, or38 interact with the empirical process being modeled or the components34, 36, and 38 may be implemented in an offline environment.

The initial model 40 specified by a model designer using the initialmodel creator 34 provides a specification of the general relationship ofa single input and single output for the empirical process to bemodeled. The initial model 40 is a general (e.g., graphic) shape, a setof data points, a base non-linear function, or other suitablespecification of the general input/output relationship for the model.The non-linear network model 42 generated by the model constructor 36 isa model of the empirical process based on the initial model 40 and asuitable modeling architecture, such as an interconnected layerapproach, as will be discussed in more detail later. The non-linearnetwork model 42 has multiple inputs based on the initial input of theinitial model 40 and a global behavior for the non-linear network model42 as a whole that conforms generally to the initial output of theinitial model 40. The optimized model 44 is an optimized version of thenon-linear network model 42 produced by the optimizer 38.

Model input 28 to the modeling module 24 is input from data files,another software program, another computer, input devices (e.g.,keyboard, mouse, etch), and the like. Empirical data input 30 to thecontroller 26 (or to the modeling module 24) is input from sensorydevices (e.g., for a manufacturing process), monitoring software (e.g.,for stock market prices), another software program, another computer,input devices (e.g., keyboard, mouse, etc.) and the like. Model output32 is provided to the controller 26, another computer, storage memory,another software program, and/or output devices (e.g., display monitor,etc.). Controller output 46 is provided to actuators (e.g., to controlpart of a process in a manufacturing plant), an exchange (e.g., to placean order on a stock exchange), another computer, storage memory, anothersoftware program, and/or output devices (e.g., display monitor, etc.)and the like. It is to be understood that the computer system 22 may belinked by appropriate links to a local area network, wide area network,global network (e.g., Internet), or similar such networks for sharing ordistributing input and output data.

In FIG. 1, the optimizer 38 is preferably an optimizer from the AspenOpen Solvers library of optimizers provided by Aspen Technology, Inc, ofCambridge, Mass. (assignee of the present invention). One such optimizeris DMO/SQP® also of Aspen Technology, Inc. Other non-linear optimizersmay be suitable for use with the invention. In a preferred embodiment,the controller is Aspen Apollo, part of the Aspen Advantage ControlSuite provided by Aspen Technology, Inc. Another controller 26 suitablefor use with the invention is DMC Plus® by Aspen Technology, Inc. In oneembodiment, the model constructor 36 is a generator of a non-linearnetwork, such as provided by Aspen IQ™ by Aspen Technology, Inc.

In one embodiment, a computer program product 80, including a computerreadable medium (e.g., one or more CDROM's, diskettes, tapes, etc.),provides software instructions for the initial model creator 34, modelconstructor 36, and/or optimizer 38. The computer program product 80 maybe installed by any suitable software installation procedure, as is wellknown in the art. In another embodiment, the software instructions mayalso be downloaded over a wireless connection. A computer programpropagated signal product 82 embodied on a propagated signal on apropagation medium (e.g., a radio wave, an infrared wave, a laser wave,a sound wave, or an electrical wave propagated over the Internet orother network) provides software instructions for the initial modelcreator 34, model constructor 36, and/or optimizer 38. In alternateembodiments, the propagated signal is an analog carrier wave or digitalsignal carried on the propagated medium. For example, the propagatedsignal may be a digitized signal propagated over the Internet or othernetwork. In one embodiment, the propagated signal is a signal that istransmitted over the propagation medium over a period of time, such asthe instructions for a software application sent in packets over anetwork over a period of milliseconds, seconds, minutes, or longer. Inanother embodiment, the computer readable medium of the computer programproduct 80 is a propagation medium that the computer may receive andread, such as by receiving the propagation medium and identifying apropagated signal embodied in the propagation medium, as described abovefor the computer program propagated signal product 82.

Referring now to FIG. 2, which is a diagram of the stages of developingthe constrained non-linear approximator in the preferred embodiment. Itis to be understood that the stages shown in FIG. 2 are equivalent tosteps in a procedure to develop and optimize a non-linear constrainedapproximator and to provide further online optimization for it.

Stage 100 is the specification of the general I/O mapping trajectory,which represents the output of the initial model 40. A model designeruses the initial model creator 34 to specify the initial model 40 byindicating the general relationship between a single input and a singleoutput (i.e., trajectory). The output or trajectory is intended torepresent the behavior of an empirical process (e.g., a physical,chemical, economic, financial or other empirical process) over time.This stage 100 involves the specification of the general shape of thegain trajectory of a chemical process, such as a polymer process. In apolymer process, the gain trajectory represents the trajectory of theoutput of the polymer process as it progresses from an initial state(e.g., zero output state) to a steady state of polymer production, as inan industrial polymer production facility. The approach of the inventionprovides more control over the gain trajectory, thus providing a moreprecise grade transition that increases the percentage of first timein-specification production product.

One implementation of the general I/O mapping stage 100 process is shownin FIG. 1 by the initial model 40, which represents the result of thisstage 100. For stage 100, the general I/O mapping is specifiedgraphically, algebraically, or generically (i.e., learned by theoptimizer 38). In one approach of using the invention, a model designeruses the initial model creator 34 to draw a graphical shape (i.e.,initial model 40) on a display of the computer system 20 that representsa general graphical shape of the gain trajectory based on the designer'sknowledge of the process. In another approach, a model designer mayprovide a table or database of input and output data that specifies ageneral shape of the I/O mapping for the initial model 40.

Furthermore, the general I/O mapping may be determined by a firstprinciples model based on the basic physical properties of the process.Examples of such first principles modeling systems are provided byassignee Aspen Technology, Inc. of Cambridge, Mass. and are described incommonly assigned U.S. patent applications Ser. No. 09/678,724, entitled“Computer Method and Apparatus for Determining State of PhysicalProperties in a Chemical Process,” now U.S. Pat. No. 6,862,562 toTreiber et al. issued on Mar. 1, 2005, and U.S. patent application Ser.No. 09/730,466, now U.S. Pat. No. 6,654,649 to Treiber et al. issued onNov. 25, 2003, entitled “Computer Method and Apparatus for OptimizedController in a Non-Linear Process,” both of which are incorporatedherein by reference.

In a preferred embodiment, the model designer selects a base non-linearfunction that provides a general I/O shape that generally corresponds tothe expected shape for the empirical process and serves as the initialmodel 40. For example, the model designer selects a base non-linearfunction that provides a non-linear monotonically increasing shape,which is suitable for many non-linear empirical processes, such as apolymer process or stock market behavior in response to certaininfluences (e.g., decreasing interest rates). Such a base non-linearfunction may be a hyperbolic function, such as a hyperbolic tangent orthe log of a hyperbolic cosine, that provides a non-linear generallymonotonically increasing shape. As discussed in more detail later, ifthe model designer selects an appropriate transfer function, such as thelog of a hyperbolic cosine, then later stages of the process (i.e.,stages 102 and 104) determines a bounded derivative of the base linearfunction to determine constraints for the constrained training stage 104(i.e., optimizing stage).

In another embodiment of the invention, in stage 100, the general I/Omapping is determined (i.e., learned) by an optimizer (not necessarilythe same optimizer as the optimizer 38 of FIG. 1). For example, anoptimizer is used to train a neural network (not to be confused with thenon-linear network of the model 42) based on empirical data input 30.The output of the neural network then represents a general shape I/Omapping that serves as the initial model 40. In this case, an optimizerserves as an initial model creator 34, and the neural network serves asthe initial model 40.

Stage 102 is the specification of the architectural interconnections oftransfer functions to create a non-linear network model 42 of theempirical process. One implementation of the architecturalinterconnection stage 102 is shown in FIG. 1 by the model constructor 36which produces the non-linear network model 42 as the result of thisstage 102. Stage 102 involves constructing the non-linear network model42 based on the initial model 40 and setting up constraints for thenon-linear network model 42 that the optimizer 38 later uses in theconstrained training stage 104 to insure that the model output 32 of theoptimized model 44 is within the constraints. In general, theconstraints reflect a model designer's knowledge of how the empiricalmodel should behave. In a preferred embodiment, the model designerchooses constraints that insure a monotonically increasing output forthe global behavior of the optimized model 44 as a whole (e.g., apolymer process). In other embodiments, the model designer choosesconstraints to insure some other behavior, such as monotonicallydecreasing behavior, or output behavior having a restricted number ofturning points (e.g., no more than one turning point). In a furtherembodiment, some other approach than one based primarily on the modeldesigner's knowledge may be used to determined how the output behaviorshould be constrained, such as an analysis of an empirical process by acomputer program to determine a general I/O mapping for the initialmodel 40 in stage 100 and appropriate constraints to be set up in stage102.

In the preferred embodiment of stage 102, a non-linear transfer functionis selected based on the base non-linear function (e.g., the non-lineartransfer function is the same as the base non-linear function ormodified in some way). The model constructor 36 establishestransformation elements and includes a non-linear transfer function ineach transformation element. In addition, each transformation elementhas a weighted input coefficient and a weighted output coefficient. Themodel constructor 36 then combines the transformation elements in afeedforward network of nodes to form layers in a layered networkarchitecture. Typically, each transformation element in one layerprovides outputs to all the transformation elements in the next layer.Each transformation element in the next layer then processes the inputsreceived from all of the transformation elements in the previous layer,for example, by summing the inputs, and transforming the sum by thenon-linear transfer function to produce outputs, which are then providedas inputs to the transformation elements in the next layer.

As described in more detail for the constrained training stage 104, theweighted input coefficients and weighted output coefficients are pairedto insure monotonicity in the outputs of each transformation elementcompared to the inputs, with the result that the global behavior of thenon-linear network model 42 is constrained to a monotonic behavior. Suchmonotonic behavior is either a monotonically increasing behavior ormonotonically decreasing behavior, depending on the shape of the initialmodel 40 based on the general behavior of the empirical process beingmodeled. In an approach of the invention referred to as “complementarypairing,” the weighted input coefficient(s) and the weighted outputcoefficient(s) for each transformation element are paired, so that alloutputs have the same sign (negative or positive) as the inputs. Forexample, if all of the inputs to a transformation element are positive,then the complementary pairing approach insures that all of the outputsof that transformation element are also positive.

The non-linear network model 42 constructed in stage 102 may be a neuralnetwork, but is not required by the invention to be a neural network. Ingeneral, conventional neural networks are universal approximators thatmay not perform predictably in areas of missing or sparse model inputdata 28, whereas the non-linear network model 42 of the invention isused to develop a constrained non-linear approximator in stage 104 thatprovides a reliable global behavior, such as increasing monotonicity, inregions of missing or sparse model input data 28 used in the constrainedtraining stage 104.

In another embodiment, the base non-linear function is one suitable foruse in providing a bounded derivative, and the bounded derivative of thebase non-linear function is used to provide constraints during theconstrained training stage 104, as will be discussed for that stage 104.Examples of the base non-linear function are functions based on thehyperbolic tangent, the sigmoidal function, and the log of a hyperboliccosine function.

As described above, in a preferred embodiment, each transformationelement in the layered network architecture for the non-linear networkmodel 42 includes a non-linear transfer function based on the basenon-linear function. The process of setting constraints by taking abounded derivative is described in more detail later. It is to beunderstood that the transformation elements are not required by theinvention to all have the same non-linear transfer function, anddifferent transformation elements may have different non-linear transferfunctions, not necessarily based on the base non-linear functiondetermined in stage 100.

Stage 104 is the constrained training stage or paradigm, which optimizesthe model coefficients such that the general shape of the I/O mappingsthat were specified in stage 100 are conserved during the training(i.e., optimizing) of the model. One implementation of the constrainedtraining (i.e., optimizing) stage 104 is shown by the model optimizer 38in FIG. 1, which produces the optimized model 44 as the result of thisstage 104. Stage 104 involves optimizing the non-linear network model 42based on empirical inputs (e.g., model input 28 or current empiricaldata input 30) to produce the optimized model 44 by constraining theglobal behavior of the non-linear network model 42. For stage 104, themodel input 28 may represent historical process data, such as thehistorical data for an industrial process facility (e.g., polymerprocess facility) or historical data about an economic process (e.g.,stock market), or a set of hypothetical model data that represents anempirical process. For stage 104, the empirical data input 30 mayrepresent current empirical data from a currently active empiricalprocess, such as an online industrial process facility or an economicprocess. In such a case, the optimizer 38 is receiving the empiricaldata input 30 in an online condition; that is, receiving the empiricaldata input 30 in real-time or nearly real-time time frame (e.g.,allowing for buffering or some other limited delay in receiving the data30 after it is sensed or recorded from the active empirical process).

In stage 104, the optimizer 38 produces the optimized model 44 byconstraining the behavior of the non-linear network model 42 while themodel 42 receives the input data 28 or 30 to train the model 42 toconform to the general I/O mapping specified in the initial model 40 andconstrained by the constraints set up in stage 102 (e.g., bycomplementary pairing, by a bounded derivative of the non-lineartransfer function, or other constraint approach). In a preferredembodiment, the optimizer 38 constrains the model output 32 to bemonotonically increasing based on the constraints as described in stage102. In alternate embodiments, the optimizer 38 constrains the modeloutput 32 by other criteria.

In general, in the preferred embodiment, the optimizer 38 seeks tooptimize the non-linear network model 42 by examining the model errorand adjusting the weights of the input and output coefficients for thetransformation elements to reduce the model error. The optimizer 38continually (or frequently) checks the results of the optimizationcompared to the constraints to insure that any update to the model 42satisfies the original constraints. If an updated version of the model42 violates the constraints, the optimizer 38 adjusts the coefficientsin a different direction (e.g., increases a coefficient value if it waspreviously decreased) in an attempt to bring the non-linear networkmodel 42 within the constraints as part of the process of modifying themodel 42 to become the optimized model 44.

Stage 106 is the model deployment, which involves the deployment of theoptimized model 44 in an empirical situation, such as controlling anindustrial process, or predicting an economic process (e.g., stockmarket).

One implementation of the model deployment stage 106 is shown in FIG. 1by the controller 26, which functions to control an empirical process(e.g., polymer process) based on the optimized model 44 through thecontroller output 46 produced by the controller 26. In this stage 106,the controller 26 (or forecaster) receives empirical data input 30 fromsensors that monitor the inputs and states of different aspects of anindustrial process. The optimized model 44 processes the inputs andprovides controller output 46 that is used to control the industrialprocess. For example, in a polymer process, the optimized model 44adjusts the flow of a chemical into the process by electronicallyadjusting the setting on an input valve that controls the flow of thatchemical.

In another implementation, the optimized model 44 is deployed as apredictor, as in a financial forecaster that serves to predict afinancial process, such as the stock market. The financial forecastermay also serve as a financial controller 26 that requests financialactions based on the optimized model 44 of the financial process, suchas requesting the purchase or sale of stock.

The controller 26 of stage 106 that is gain scheduled with the optimizedmodel 44 (i.e., constrained non-linear approximator) is a more robustcontroller than one that is gain scheduled with a universalapproximator, and the controller 26 behaves in a predictable manner overthe entire operating range of the process.

Stage 108 is the hybrid modeling stage, which involves the inclusion oraddition of other model structures (other than the initial model 40, thenon-linear network model 42, and the optimized model 44), which may beused to influence the constrained training stage 104 or affect the modeldeployment stage 106.

In one approach, the other model structure is an advisory model that isused to advise, refine, or influence the training of the non-linearnetwork model 42 in the constrained training stage 104. For example, theadvisory model is a first principles model, such as a first principlesmodel of a chemical (e.g., polymer) process.

By allowing for use of other models, the approach of the inventionprovides for a more precise prediction of both inferred properties andtheir derivatives by using a combination of engineering knowledge, firstprinciples models, regression based models, and the constrainednon-linear approximator described herein or part thereof.

In another approach, the other model provided in stage 108 is a greateror overall model that models a greater or overall empirical process. Inthis approach, the optimized model 44 is one part or aspect of thegreater model, or the optimized model 44 represents one step orprocedure in the greater process. For example, in a polymer process, theoptimized model 44 may be a model for one component of the overallpolymer process, such as a reactor. The optimized model 44 may also beconsidered a child of a parent that models the greater empiricalprocess. Generally, the optimized model 44 may be included in orassociated with a greater model, or provide input to the greater model,as well as advise, influence, or direct such a greater model.Furthermore, any of the other models 40 and 42 of the invention may beused with a greater model, and any of the components (i.e., initialmodel creator 34, model constructor 36, and optimizer 38) of theinvention may be used with, associated with, included in, or provideinput to a greater model, in a manner similar to what has been describedfor the optimized model 44 above.

Stage 110 is the constrained on-line model adaptation, involving thefine tuning or correcting of an optimized model 44 that has beendeployed in the model deployment stage 106. Such fine tuning oradaptation of the optimized model 44 may be required if the controller26 receives input for some new region of data that was not represented(or sparsely represented) by the model input 28 used to train thenon-linear network model 42 in stage 104 to produce the optimized model44. For example, the optimized model 44 (i.e., constrained non-linearapproximator) provides output that is generally monotonically increasingin the new region, but may require further optimization to obtain animproved result. In addition, such adaptation may be required if theperformance of the optimized model 44 as deployed in the controller 26has deteriorated or has not met original expectations.

In stage 110, the optimizer 38 checks the results of the on-lineoptimization compared to the constraints to insure that any update tothe optimized model 44 satisfies the original constraints. If an updatedversion of the optimized model 44 violates the constraints, theoptimizer 38 adjusts the coefficients in a different direction (e.g.,increases a coefficient value if it was previously decreased) in anattempt to bring the model 44 within the constraints. In general, theprocess of constrained online model adaptation in stage 110 is similarto the process of constrained training in stage 104.

The modular nature of this invention means that each stage 100, 102 and104 may be implemented independently of the others. As an example, thetraining algorithm described in stage 104 may be applied to amultilayer-perceptron neural network in order to restrict the functionsuch that certain input/output relationships are monotonicallyconstrained over their entire input domain.

The invention allows each input/output relationship to be treated inisolation. Hence, some input/output relationships may be leftunconstrained and thus allow them to have complete universalapproximating capability. Other input/output relationships may beconstrained to be monotonic and others may be given a general gaintrajectory shape to adhere to.

The invention encompasses both steady state and dynamic modelingarchitectures that may be used for both gain scheduling and non-linearprograms in steady state optimizers.

Mathematical Foundations of the Invention

The following sections describe the mathematical foundations of theinventions. The headings are not meant to be limiting. A topic indicatedin a heading may also be discussed elsewhere herein.

These following sections describe one implementation of the non-linearnetwork model 42 described earlier for FIGS. 1 and 2.

General Structure

The monotonicity conditions are imposed on the non-linear network model42 both through architecture (stage 102) and through constraining thetraining algorithm (stage 104). The following sections first define thecalculations for a general, feedforward neural network (herein “neuralnet”) since it is clearer to describe first and second derivativecalculations in general form. Later sections then look at the specificmeans of imposing monotonicity.

Notation

A general feedforward neural net consists of an ordered set of L layers.The position of each processing element (PE) in a layer is representedby a subscript—i, j, k, 1, m, and n are used as PE indices. Theprocessing element is one example of the transformation elementdescribed for stage 102 of FIG. 2. Each PE has a summation value x_(i),and an output value y_(i), a transfer function f_(i) relating x_(i) toy_(i). Processing elements in different layers are distinguished ifnecessary by a superscript in parentheses—p, q, r, and s are used aslayer indices. Weights between PE's are notated as w_(ij) ^((p,q)) whichrepresents the connection weight from y_(j) ^((q)) to x_(i) ^((p)), q<p.

Note that this allows for several layers to feed a given layer; bias isreadily dealt with in this structure by specifying it as a singleelement layer with its summation value x_(l)=1, and a linear transferfunction.

Data Scaling

Neural nets require data to be scale to normalized units. Typically,this is done by a linear mapping that transforms the training and testdata to 0 mean and standard deviation of 1.

Feedforward Equations

$\begin{matrix}\begin{matrix}{x_{i}^{(1)}{data}\mspace{14mu}{input}} \\\left. \downarrow\vdots \right. \\{y_{i}^{({p - 1})} = {f_{i}^{({p - 1})}\left( x_{i}^{({p - 1})} \right)}} \\{x_{i}^{(p)} = {\sum\limits_{q < p}{\sum\limits_{j}{w_{ij}^{({p,q})}y_{j}^{(r)}}}}}\end{matrix} & (1)\end{matrix}$Objective Function

A set of measured data points is used for training the neural net (oneexample of the non-linear network model 42). This consists of a set ofmeasured inputs and corresponding measured outputs (an example of themodel input 28 used in training the non-linear network model 42 in stage104 of FIG. 2). The neural net tries to recreate this mapping betweenmeasured inputs and measured outputs, so that outputs can be estimatedin the absence of measurements. This training is achieved byconstructing an objective function that is a measure of goodness of fit.However, the data also contains noise and spurious relationships, so theobjective function also contains a term to inhibit complexity in themapping.

Notationally:J=J _(D)({y _(i) ^((L))})+J _(W)({w _(ij) ^((p,q))})  (2)J_(D) is the measure of how well the neural net fits the data and is afunction of a data set, and indirectly, of the weights. J_(W) is theregularization term which inhibits overfitting and is a direct functionof the weights.Derivatives

The derivative calculation in a feedforward neural net is referred to asback-propagation since it calculates derivatives of the objective withrespect to the weights by propagating the derivative of the objectivewith respect to the outputs back through the network. This makes use ofa chain rule that in the neural net world is attributed to Werbos. SeePaul John Werbos, “The Roots of Backpropagation: From OrderedDerivatives to Neural Networks and Political Forecasting (Adaptive andlearning systems for signal processing)”, January, 1994.

$\begin{matrix}\begin{matrix}{{D_{J}y_{i}^{(L)}} = {{\partial J}/{\partial y_{i}^{(L)}}}} \\\left. \downarrow\vdots \right. \\{{D_{J}x_{i}^{(p)}} = {{f_{i}^{\prime}\left( x_{i}^{(p)} \right)}D_{J}y_{i}^{(p)}}} \\{{D_{J}y_{i}^{({p - 1})}} = {\sum\limits_{q \geq p}{\sum\limits_{j}{w_{ji}^{({q,{p - 1}})}D_{J}x_{j}^{(q)}}}}} \\\left. \downarrow\vdots \right.\end{matrix} & (3)\end{matrix}$Then calculate the weight gradient as:D_(J)w_(ij) ^((p,q))=y_(j) ^((q))D_(J)X_(i) ^((p))q<p  (4)Second Derivatives

Some optimizers (e.g., optimizer 38), make use of Hessian information.It turns out that Hessian information can be calculated analytically ina general feedforward neural net by passing information forwards andbackwards through the network. The idea is to consider each of thederivatives from the previous section as appending the original set ofvariables (x's, y's, and w's). Then use Werbos's chain rule to calculatethe second derivatives. For each weight W_(mn) ^((r,s)), letℑ≡D_(J)W_(mn) ^((r,s)) be considered as the new objective. The goal isto calculate D_(ℑ)(w_(ij) ^((p,q))). Then perform a forward and backwardpass through the network, starting at the given weight's destinationlayer, and ending at the given weight's source layer:

D ⁡ ( D J ⁢ y i ( p ) ) = 0 ⁢ ⁢ p < r D ⁡ ( D J ⁢ x i ( r ) ) = δ mi ⁢ y n ( s) ↓ ⋮ D ⁢ D J ⁢ y i ( p - 1 ) = f i ′ ⁡ ( x i ( p - 1 ) ) ⁢ D ⁡ ( D J ⁢ x i (p - 1 ) ) D ⁡ ( D J ⁢ x i ( p ) ) = ∑ q ≥ r , q < p ⁢ ∑ j ⁢ w ij ( p , q ) ⁢D ⁡ ( D J ⁢ y j ( q ) ) ↓ ⋮ D ⁡ ( y i ( L ) ) = ( ∂ 2 ⁢ J / ∂ ( y i ( L ) )2 ) ⁢ D ⁡ ( D J ⁢ y i ( L ) ) ↓ ⋮ D ⁡ ( x i ( p ) ) = f i ′ ⁡ ( x i ( p ) ) ⁢D ⁡ ( y i ( p ) ) + f i ″ ⁡ ( x i ( p ) ) ⁢ D J ⁢ y i ( p ) ⁢ D ⁡ ( D J ⁢ x i (p ) ) D ⁡ ( y i ( p - 1 ) ) = ∑ q ≥ p ⁢ ∑ j ⁢ w ji ( q , p - 1 ) ⁢ D ⁡ ( x j( q ) ) ↓ ⋮ D ⁡ ( y i ( s ) ) = δ ni ⁢ D J ⁢ x m ( r ) + ∑ q > s ⁢ ∑ j ⁢ w ji( q , s ) ⁢ D ⁡ ( x j ( q ) ) ( 5 )Then calculate the Hessian with respect to the weights using theformula:

∂ 2 ⁢ J ∂ w mn ( r , s ) ⁢ ∂ w ij ( p , q ) ≡ D ⁡ ( w ij ( p ) ) = y j ( q) ⁢ D ⁡ ( x i ( p ) ) + D J ⁢ x i ( p ) ⁢ D ⁡ ( D J ⁢ y j ( q ) ) ⁢ ⁢ ⁢ p ≥ r , q≥ s ( 6 )Note that the forward and backward pass through the network must beperformed for each weight for which a 2^(nd) order derivative isrequired. However, once this has been done, any of the secondderivatives involving that weight can be easily calculated with twomultiplications and an addition.

The summations, outputs, and back-propagated information from theoriginal forward and backward pass (used to calculate the objective andthe gradient) must be maintained during these Hessian passes, since theformulas make use of them. In addition, a Hessian forward and backwardpass differs from the original as follows:

-   i. Feed D_(ℑ)(D_(J)x_(i) ^((r))) as the input (i.e. summation value)    to the r^(th) layer.-   ii. In the feedforward pass    -   (a) The source layers below the p^(th) layer are initialized to        have output 0    -   (b) the original transfer function at each node gets replaced by        a scalar multiplication by the original f_(k)′(x_(k) ^((m))).-   iii. Calculate the value to feedback by multiplying the output from    the feedforward pass by the Hessian of the original objective    function J with respect to the original outputs. For standard RMS    error based objectives, this Hessian is just a constant times the    identity matrix-   iv. In the back-propagation pass:    -   (a) Propagate back to the weights source layer only.    -   (b) There is now also a second derivative term for Dℑ(x_(i)        ^((p))) which is multiplied by the saved output from        feed-forward step.    -   (c) The derivative Dℑ(y_(n) ^((s)))has an extra term D_(j)x_(m)        ^((r)) representing its direct influence on ℑ.        Conventional Training

Conventional training algorithms for a standard feed forward apply anunconstrained optimizer to minimize the objective function. Typicallythe only decision variables are the weights. The objective and itsderivatives and second derivatives with respect to the weights arecalculated using the above formulas.

Transfer Functions

For demonstration purposes, three transfer functions are described foruse in a preferred embodiment of the present invention. The transferfunctions for the invention described herein are not limited to thesethree examples. In different embodiments, the invention can utilize anynon-linear transformation and still produce an enhanced modelarchitecture (e.g. non-linear network model 42) for use in model basedcontrol and optimization schemes. The activation or other transformationmay actually be a single input/single output neural (or othernon-linear) network which could be trained on a user definedinput/output or gain trajectory mapping (e.g. initial model 40). It isthe constrained optimization (e.g. constrained training stage 104 ofFIG. 2) that generates the robustness properties desirable in advancedcontrol and optimization schemes. The sample transfer functions are:tanh, sigmoid and asymmetric Bounded Derivative (ABD). Their formulas,derivatives, and second derivatives are as follows:

Tanhy=tan h(x)y′=1−y ²y″=−2·y·y′  (7)Sigmoidy=0.5(tan h(x)+1)y′=y−y ²y″=(1−2y)·y′  (8)ABDy=α·x+β·ln(cos h(x))y′=α+β·tan h(x)y″=β·(l−tan h ² (x))  (9)The ABD transfer function used in a preferred embodiment of theinvention is monotonic positive under the following conditions:β≧0,α−β>0orβ≦0,α+β>0  (10)Other advantages of the ABD formulation (equations at (9)) are that theinput/output relationship does not saturate at the extremes of the data.It is actually the derivative (y′=α+β·tan h(x)) of the function(y=α·x+β·ln(cos h(x))) that saturates, which yields linear models inregions of extrapolation (e.g., when entering regions of data that weremissing or sparsely covered in the training data, such as model input28).Monotonic Neural Net Structure

The following sections describe examples for setting up constraints fora non-linear network model 42 in a preferred embodiment of theinvention. The constraining conditions for monotonicity are described(but not limited to) the following:

Complementarity Conditions

The three sample transfer functions (equations 7, 8 and 9) described forthis invention are monotonic transformations. The sigmoidal activationand hyperbolic tangent are also rotationally symmetric, i.e.tan h(x)=·tan h(−x)  (11)

The law of superposition allows that if two positively monotonicfunctions are added together then the resulting transformation is alsopositively monotonic. Similarly, if two negatively monotonic functionsare added together, the resulting transformation is negativelymonotonic.

The output node of the non-linear network model 42 is essentially alinear summation of monotonic transformations. Hence, provided the signof the coefficient which maps an input variable to a hidden node and thesign of the coefficient connecting this node to the output layer arecomplementary to the desired direction of monotonicity (for all hiddennodes) then the overall monotonicity of the input/output relationship isconserved.

Example of Setting Complementarity Conditions

If the desired input/output relationship is required to be positivelymonotonic. Then for a non-linear network model 42 with four hidden nodeswith output weights signs (+,−,+,−) respectively, then the correspondingcoefficients mapping this input to each hidden node should be (+,−,+,−)respectively. Two negatively signed coefficients in series produce apositively monotonic transformation as described in equation (11).Although the ABD transformation does not obey the rotational symmetrydescribed in equation (11), the function −ABD(−x) is positivelymonotonic and so still produces an overall positive input/outputmonotonicity. The same logic applies for negative monotonictransformations.

The following sections provide two examples of constrained non-linearapproximators (CNA) architecture suitable for use in developing examplesof the non-linear network model 42 of stage 102 of FIG. 2. The firstexample illustrates a 6-layer non-linear layered network CNAarchitecture and the second example illustrates a 5-layer non-linearlayered network CNA architecture. The use of the phrases “first example”and “second example” is not meant to be limiting in any way.

First Example of CNA Architecture (for Six Layers)

FIG. 3 is an example of a 6-layer constrained non-linear approximator(CNA) architectural specification for an example of a non-linearnetwork, which may be used as the basis for on example of a non-linearnetwork model 42. The actual architecture detailed in this diagram isthe integral of a non-linear network where the non-linear hidden layercontains a summation followed by an ABD (e.g., ln(cos h(x)))transformation and where the integral of the non-linear network isconsidered equivalent to the non-linear network model 42. Although anylayer architecture may be used in this invention, in the preferredembodiment the non-linear network integral is used, one example of whichis the neural network integral. As previously discussed, conventionalneural networks (e.g., used in universal approximators) are good atpredicting input/output relationships but are poor predictors ofderivatives. Hence, fitting a non-linear network integral toinput/output data means that the non-linear network (i.e. the derivativeof the non-linear network model 42) is the underlying architecture thatis fitting the derivative of the relationship in the training data. Thistherefore forms a solution to the problem of generating robust,non-linear empirical models (e.g. non-linear network model 42) with wellbehaved derivatives. The examples of CNA architecture described herework well in closed loop control schemes such as chemical processindustrial production facilities. In addition, because with this CNAarchitecture it is the model derivative (e.g. derivative of an optimizedmodel 44 based on a non-linear network model 42) that saturates (not theactual input/output relationship), the models (e.g. optimized models 44)smoothly converge to linear models in regions of extrapolation.

Referring to FIG. 3, the non-linear network 50 includes an input layer200, bias layer 201, transformed layer 202, linear hidden layer 203,non-linear activation layer 204, linear activation 205 and output layer206. The input layer 200 includes one or more elements L0; the biaslayer 201 includes one or more elements L1; the transformed layer 202includes one or more elements L2; the linear hidden layer 203 includesone or more elements L3; the non-linear activation layer 204 includesone or more elements L4; the linear activation layer 205 includes one ormore elements L5; and the output layer 206 includes one or more elementsL6.

The training data (e.g. model input 28) is presented to the input layer200. Each node L0 through L6 in the architecture represents a processingelement (PE). Each processing element has one or more inputs and one ormore outputs. The processing element (e.g., processing elements L3)typically sums any inputs to it and then passes this summation through atransfer function. The transfer function may be non-linear (as in thecase of layer 204) or linear (where in effect the summed inputs form theoutput of the processing element).

Each arrow in FIG. 3 represents a model coefficient (or weighting). Theconnections (arrows) between the input layer 200 and transformed layer202 are fixed at a value of 1 (in this example in FIG. 3). This is atransformation layer 202 which allows the direction of the input data tobe changed (i.e. switch the coefficients to −1) if necessary.

The bias layer 201 provides a bias term. The connection of this layer201 to the output layer 206 essentially represents the “constant” termthat occurs when integrating a neural network.

Layer 203 is a hidden layer were the inputs are simply added together.No transformation is performed at this layer 203. In a conventionalneural network, these summations would then be passed through asigmoidal (s-shaped) or hyperbolic tangent activation function. In theintegral case (i.e., integral approach using the techniques of theinvention), the summations from layer 203 are passed through theintegral of the hyperbolic tangent (namely integral(q*tanh(v*X))=a*X+b*log(cos h(v*X))+c). This is achieved by layers 204, 205and 201. Finally, the transformed inputs from layer 205 are connecteddirectly to the output layer 206. This connection represents theintegral of the bias term in a conventional neural network.

The layer CNA architecture of FIG. 3 is an example of a non-linearnetwork architecture that may be used in this invention. The exampleillustrated in FIG. 3 and in a second example described in the followingsections may be used in any application of non-linear empiricalmodeling.

Second Example of CNA Architecture (for Five Layers)

The following sections describe a second example of a CNA architecturesuitable for use with the invention.

The monotonic neural net structure described here for the second CNAarchitecture example consists of five layers. The five layers includethe input layer, bias layer, signed input layer, hidden layer and outputlayer. The invention is not limited to any specific number of layers.The invention encompasses any such constrained neural architectureswhich utilize a non-linear constraining optimization algorithm for thepurpose of producing well behaved non-linear models for use in modelbased control and optimization schemes.

The non-standard layer is the Signed Input layer which is used torepresent the direction of the non-linearity.

Layer Scheme for Second Example of CNA Architecture Layer # PEs TransferFunction 1. Input # input variables Linear 2. Bias 1 (constant outputof 1) Linear 3. Signed Input # input variables Linear 4. Hidden userselected Hyperbolic Tangent, Sigmoid, (default 4) or Asymmetric BoundedDerivative 5. Output 1 LinearConnection Scheme for Second Example of CNA Architecture

The following table shows the connection scheme between layers. A fullconnection means that every PE in the source layer is connected to everyPE in the destination layer. A corresponding connection implies that thesource and destination layers have the same number of PEs and each PE inthe source layer is connected to the corresponding PE in the destinationlayer.

From: Signed To: Input Bias Input Hidden Signed Corresponding InputHidden Full Full Output Full FullSpecifying Monotonicity for the Second Example of the CNA Architecture

In the approach referred to here as “complementarity pairing,” the modeldesigner first is able to specify the monotonicity of each inputvariable to be one of the following:

-   Monotonic Positive-   Monotonic Negative-   Unknown Monotonicity-   Non-monotonic

Denote the set of indices corresponding to these four options as I₊, I⁻,I_(?), and I_(non) respectively. Monotonicity is achieved by imposingconstraints on the weights of the data paths between the signed inputlayer (layer 3) and the output PE layer (layer 5). These data paths areindirect via the hidden layer (layer 4). Using the indexing notationdescribed in the section “Notation” herein, the constraints arespecified as:C_(ji) ≡−w _(1j) ^((5,4)) w _(ji) ^((4,3))<0, iεI₊∪I⁻∪I_(?)  (12)Because the transfer functions at each layer are monotonic positive,each path between the signed input layer and the output PE represents amonotonic positive calculation. It is the job of the weights between theinput layer and the signed input layer to provide the direction of themonotonicity.Constraining the Direction of the Monotonicity for the Second Example ofthe CNA Architecture

If the direction of the monotonicity is specified in advance by theuser, then the weight between the input and signed input is constrainedto be of that sign. Otherwise there is no constraint put on that weight.Mathematically:w _(ii) ^((3,1))>0 iεI ₊w _(ii) ^((3,1))<0 iεT ⁻  (13)Objective Function for the Second Example of the CNA Architecture

Using the notation in section 0:

$\begin{matrix}\begin{matrix}{J_{D} = {\frac{1}{2K}{\sum\limits_{{data}\mspace{14mu}{set}}\left( {y_{meas}^{(L)} - y^{(L)}} \right)^{2}}}} \\{J_{W} = {\frac{1}{2}{\overset{L}{\sum\limits_{p,{q = 1}}}{\beta^{({p,q})}{\sum\limits_{i,j}\left( w_{ij}^{({p,q})} \right)^{2}}}}}}\end{matrix} & (14)\end{matrix}$where β^((p,q)) is a tuning parameter. For this implementation, all theβ^((p,q)) are user settable as a single Regularization tuning parameterwith a small default value, except for β^((3,1)) which is set to 0 sothat monotonicity determination is unhindered.Constraint Derivatives for the Second CNA Architecture

The constraint derivatives have a sparse structure. Each constraint hasonly 2 non-zero derivatives giving a total of 2×H×N_(M) non-zeroconstraint derivatives, where H is the number of hidden PEs and N_(M) isthe number of monotonic input variables:

$\begin{matrix}{{\left. \begin{matrix}{\frac{\partial C_{ji}}{\partial w_{1j}^{({5,4})}} = {- w_{ji}^{({4,3})}}} \\{\frac{\partial C_{ji}}{\partial_{ji}^{({4,3})}} = {- w_{1j}^{({5,4})}}}\end{matrix} \right\}\mspace{14mu} i} \in {I_{+}\bigcup I_{-}\bigcup I_{?}}} & (15)\end{matrix}$Any suitable constrained non-linear optimizer 38 may now be used togenerate the model solution. This completes the discussion of the SecondCNA Architecture.Constraints Based on a Bounded Derivative

In a preferred embodiment of the invention, constraints may becalculated based on an asymmetric bounded derivative. Referring to theexample of a non-linear network 50 shown in FIG., 3, the generalequation describing one example of the input/output relationship in FIG.3 is:

${{{Equation}\mspace{14mu}(16)}:\mspace{590mu} y} = {w_{11}^{({6,1})} + {\sum\limits_{i}{w_{1i}^{({6,2})}w_{1i}^{({2,0})}x_{i}}} + {\sum\limits_{J}{w_{1J}^{({6,5})}\left( {w_{jj}^{({5,4})}\left( {{\log\left( {\cosh\left( {w_{j1}^{({3,1})} + {\sum\limits_{i}{w_{ji}^{({3,2})}\left( {w_{ii}^{({2,0})}x_{i}} \right)}}} \right)} \right)} + {w_{jj}^{({5,3})}\left( {w_{j1}^{({3,1})} + {\sum\limits_{i}{w_{ji}^{({3,2})}\left( {w_{ii}^{({2,0})}x_{i}} \right)}}} \right)}} \right)} \right)}}}$For the notation, refer to the “Notation” section provided previouslyherein.

In this example, the logarithm of the hyperbolic cosine has been chosenas the non-linear transfer (activation) function which provides abounded derivative trajectory (the derivative of the log(cosh( ))function is the bounded hyperbolic tangent).

The derivative of equation 16 can be calculated as:

$\begin{matrix}{\frac{\partial y}{\partial x_{k}} = {{w_{1k}^{({6,2})}w_{kk}^{({2,0})}} + {\sum\limits_{J}{w_{1j}^{({6,5})}w_{jk}^{({3,2})}w_{kk}^{({2,0})}{\quad\left( {{w_{jj}^{({5,4})}\left( {\tanh\left( {w_{j1}^{({3,1})} + {\sum\limits_{i}{w_{ji}^{({3,2})}\left( {w_{ii}^{({2,0})}x_{i}} \right)}}} \right)} \right)} + w_{jj}^{({5,3})}} \right)}}}}} & {{Equation}\mspace{14mu}(17)}\end{matrix}$The theoretical bounds on the above function (equation 17) can becalculated as:

$\begin{matrix}{{\frac{\partial y}{\partial x_{k_{{bound}\;{(1)}}}} = {w_{kk}^{({2,0})}\left( {{\sum\limits_{j}{w_{1j}^{({6,5})}w_{jk}^{({3,2})}w_{jj}^{({5,3})}}} - {\sum\limits_{j}{{w_{1j}^{({6,5})}w_{jk}^{({3,2})}w_{jj}^{({5,4})}}}} + w_{1k}^{({6,2})}} \right)}}{\frac{\partial y}{\partial x_{k_{{bound}\;{(2)}}}} = {w_{kk}^{({2,0})}\left( {{\sum\limits_{j}{w_{1j}^{({6,5})}w_{jk}^{({3,2})}w_{jj}^{({5,3})}}} + {\sum\limits_{j}{{w_{1j}^{({6,5})}w_{jk}^{({3,2})}w_{jj}^{({5,4})}}}} + w_{1k}^{({6,2})}} \right)}}} & {{Equations}\mspace{14mu}(18)\mspace{14mu}{and}\mspace{14mu}(19)}\end{matrix}$

The derivative of equation (16) is guaranteed to globally be within thebounds described by equations (18) and (19) due to the saturation of thehyperbolic tangent function between the above limits.

Which bound is the upper and which is the lower depends on the sign ofw_(kk) ^((2,0)).

During training of the model 44, the above bounds can be calculated ateach optimization iteration. The derivatives of the above bounds withrespect to each coefficient in the model 44 can be calculated andconstraints placed on the model 44 based on the above bounds lyingwithin specified limits (e.g. a lower bound of zero and an upper boundof 1e+20 would guarantee that for that input, the input/outputrelationship would be globally positively monotonic). A lower bound ofslightly greater than zero would guarantee global extrapolationcapability.

If the inputs to the model 44 described in equation (16) are statevectors from for example a state space model, then the overall steadystate gains between the actual model inputs and the output can beconstrained by including the steady state contribution of each statevariable to the output (for that particular input) as a linear set ofweighting factors in equations (18) and (19). Examples of such statespace models are provided by assignee Aspen Technology, Inc. ofCambridge, Mass. and are described in commonly assigned U.S. patentapplication Ser. No. 09/160,128, now U.S. Pat. No. 6,453,308 to Zhao etal. issued on Sep. 17, 2002 and filed Sep. 24, 1998, entitled“Non-linear Dynamic Predictive Device,” and U.S. Pat. No. 5,477,444,issued Dec. 19, 1995, entitled “Control System Using an Adaptive NeuralNetwork for a Target and Path Optimization for a Mulitvariate, NonlinearProcess”, both of which are incorporated herein by reference.

Functioning of the Constrained Optimizer

This section describes how the optimizer 38 functions in producing theoptimized model 44 from the non-linear network model 42.

The optimizer 38 requires an objective function. In this case, theobjective function is typically the square of the model errorE=(y−y_(target)) ². In order to minimize this objective function, theoptimizer 38 requires information on how each coefficient of thenon-linear network model 42 affects the model error.

$\left( {i.e.\frac{\partial E}{\partial w}} \right).$The theory of backpropagation can be used to derive these relationshipsanalytically for a layered network model architecture. This data isrefered to as the ‘Jacobian’ of the non-linear network model 42. Thebackpropagation theory can be extended to include second derivativeinformation (i.e. the Hessian). Armed with this information, theoptimizer 38 can then begin its search to minimize the model error. In apreferred embodiment certain constraints are placed on thisoptimization. A simple case is the weight pairing constraints for the a5-layer non-linear network described herein.

A constraint may be formulated as:c ₁ =−w ₁ w ₂  (20)

Where the purpose of the constraint is that c₁ must always be negative.Hence w₁ and w₂ then have the same sign (where w₁ and w₂ are two weightsthat we may wish to constrain).

Hence, the optimizer 38 continuously calculate the above constraint. Ifduring optimization, the value of c₁ (or any of the other constraints)reaches zero or goes positive, then the optimizer 38 shifts from tryingto minimize the objective function E and concentrates on getting theconstraint calculation back to less than zero. To do this, the optimizer38 needs to know the derivatives of the constraint with respect to eachof the coefficients in the constraint. Hence:

$\begin{matrix}{\frac{\partial c_{1}}{\partial w_{1}} = {- w_{2}}} & (21) \\{\frac{\partial c_{1}}{\partial w_{2}} = {- w_{1}}} & (22)\end{matrix}$

Armed with this information, the optimizer 38 attempts to eliminate theconstraint violation. Optimization is terminated when no furtherreduction in the objective can be achieved.

The pairing constraint (i.e., complementarity pairing) is just oneexample of how to constrain layered model architectures in order toguarantee a specific type of global behavior (in this casemonotonicity). The approach of the invention may be used to constrainthese models generally in order to achieve a specific global modelbehavior (not necessarily monotonicity). For example, the non-linearnetwork integral architecture (or bounded derivative network) hasspecific bounds on the model derivative that can be calculated by theoptimizer 38. Since they can be calculated, they can be constrained as aspecific application of the present invention.

Alternative Optimization Strategies

The approaches described so far are examples of the many ways ofconstraining the neural networks in order to ascertain the salientfeatures of the constrained non-linear approximator of the presentinvention. Alternative strategies may include (but are not limited to)optimization without analytical derivatives (e.g., finite differenceapproximation), penalty functions for non-monotonic solutions (e.g.input to hidden weight/hidden to output weight complementarityviolations) and constrained optimization of the ABD activation functionswhere the constraints are the minimum and/or maximum derivative of eachactivation function and any linear combination thereof.

While this invention has been particularly shown and described withreferences to preferred embodiments thereof, it will be understood bythose skilled in the art that various changes in form and details may bemade therein without departing from the scope of the inventionencompassed by the appended claims.

1. A computer-implemented method for modeling a non-linear empiricalindustrial process, said method comprising the steps of: creating aninitial model generally corresponding to the non-linear empiricalindustrial process to be modeled, the initial model having a basenon-linear function, an initial input and an initial output;constructing a non-linear network model based on the initial model, thenon-linear network model having (a) multiple inputs based on the initialinput and (b) a global behavior for the non-linear network model as awhole that conforms generally to the initial output, the global behaviorbeing at least in regions of sparse initial input; and calibrating thenon-linear network model based on empirical inputs of the non-linearempirical industrial process by using a hound on an analyticalderivative of the base non-linear function that allows global propertiesincluding at least a global minimum value and a global maximum value ofthe analytical derivatives to be calculated directly from modelcoefficients, the global properties used to produce, via a constrainednonlinear optimization method, an analytically constrained model withglobal behavior, the constrained model enabling precision control of thenon-linear empirical industrial process.
 2. The method of claim 1,wherein the step of creating the initial model includes specifying ageneral shape of a gain trajectory for the non-linear empiricalindustrial process.
 3. The method of claim 1, wherein the step ofcreating the initial model includes specifying a non-linear transferfunction suitable for use in approximating the non-linear empiricalindustrial process.
 4. The method of claim 3, wherein the non-linearnetwork includes interconnected transformation elements and the step ofconstructing the non-linear network includes incorporating thenon-linear transfer function into at least one transformation element.5. The method of claim 4, wherein the step of calibrating the non-linearmodel includes setting constraints by taking a bounded derivative of thenon-liner transfer function.
 6. The method of claim 5, wherein thenon-linear transfer function includes the log of a hyperbolic cosinefunction.
 7. The method of claim 1, wherein the non-linear network modelis based on a layered network architecture having a feedforward networkof nodes with input/output relationships to each other, the feedforwardnetwork having transformation elements; each transformation elementhaving a non-linear transfer function, a weighted input coefficient anda weighted output coefficient; and the step of calibrating the non-linernetwork model includes constraining the global behavior of thenon-linear network model to a monotonic transformation based on theinitial input by pairing the weighted input and output coefficients foreach transformation element in a complementary manner to provide themonotonic transformation.
 8. The method of claim 1, wherein the step ofcalibrating the non-linear network model comprises adjusting thecalibration based on information provided by an advisory model thatrepresents another model of the non-linear empirical industrial processthat is different from the initial model, the non-linear network model,and the constrained model.
 9. The method of claim 8, wherein theadvisory model is a first principles model of the non-liner empiricalindustrial process.
 10. A computer-implemented method for modeling anon-linear empirical industrial process, and controlling a greaterprocess, said method comprising the steps of: creating an initial modelgenerally corresponding to the non-linear empirical industrial processto be modeled, the initial model having a base non-linear function, aninitial input and an initial output; constructing a non-linear networkmodel based on the initial model, the non-linear network model having(a) multiple inputs based on the initial input and (b) a global behaviorfor the non-linear network model as a whole that conforms generally tothe initial output, the global behavior being at least in regions ofsparse initial input; and calibrating the non-linear network model basedon empirical inputs of the non-linear empirical industrial process byusing a bound on an analytical derivative of the base non-linearfunction that allows global properties including at least a globalminimum value and a global maximum value of the analytical derivativesto be calculated directly from model coefficients, the global propertiesused to produce, via a constrained nonlinear optimization method, ananalytically constrained model with global behavior, the constrainedmodel enabling precision control of the non-linear empirical industrialprocess, the non-linear empirical industrial process being part of thegreater process, and deploying the constrained model in a controllerthat controls the greater process.
 11. A computer apparatus for buildinga model for modeling a non-linear empirical industrial process,comprising: a model creator for creating an initial model generallycorresponding to the non-linear empirical industrial process to bemodeled, the initial model having a base non-linear function, an initialinput and an initial output, the global behavior being at least inregions of sparse initial input; a model constructor coupled to themodel creator for constructing a non-linear network model based on theinitial model, the non-linear network model having multiple inputs basedon the initial input and a global behavior for the non-linear networkmodel as a whole that conforms generally to the initial output; and acalibrator coupled to the model constructor for calibrating thenon-linear network model based on empirical inputs of the non-linearempirical industrial process by using a bound on an analyticalderivative of the base non-linear function that allows global propertiesincluding at least a global minimum value and a global maximum value ofthe analytical derivatives to be calculated directly from modelcoefficients, the global properties used to produce, via a constrainednonlinear optimization method, an analytically constrained model withglobal behavior, the constrained model enabling precision control of thenon-linear empirical industrial process.
 12. The computer apparatus ofclaim 11, wherein the model creator specifies a general shape of a gaintrajectory for the non-liner empirical industrial process.
 13. Thecomputer apparatus of claim 11, wherein the model creator specifics anon-linear transfer function suitable for use in approximating Thenon-linear empirical industrial process.
 14. The computer apparatus ofclaim 13, wherein the non-linear network includes interconnectedtransformation elements and the model constructor incorporates thenon-linear transfer function into at least one transformation element.15. The computer apparatus of claim 14, wherein the calibrator setsconstraints by taking a bounded derivative of the non-linear transferfunction.
 16. The computer apparatus of claim 15, wherein the non-lineartransfer function includes the log of a hyperbolic cosine function. 17.The computer apparatus of claim 11, wherein the model constructorconstructs the non-linear network model based on a layered networkarchitecture having a feedforward network of nodes with input/outputrelationships to each other, the feedforward network havingtransformation elements, each transformation element having a non-lineartransfer function, a weighted input coefficient and a weighted outputcoefficient; and the calibrator constrains the global behavior of thenon-linear network model to a monotonic transformation based on theinitial input by pairing the weighted input and output caefficients foreach transformation element in a complementary manner to provide themonotonic transformation.
 18. The computer apparatus of claim 11,further comprising an advisory model that represents another model ofthe non-linear empirical industrial process that is different from theinitial model, the non-linear network model, and the constrained model;and wherein the calibrator adjusts the calibration based on informationprovided by the advisory model.
 19. The computer apparatus of claim 18,wherein the advisory model is a first principles model of the non-linearempirical industrial process.
 20. The computer apparatus of claim 11,wherein the non-linear empirical industrial process is pan of a greaterprocess managed by a controller coupled to controller optimizer, and thecontroller optimizer communicates the constrained model to thecontroller for deployment in the controller.
 21. A computer programproduct that includes a computer usable medium having computer programinstructions stored thereon for building a model for modeling anon-linear empirical industrial process, such that the computer programinstructions, when performed by a digital processor, cause the digitalprocessor to: create an initial model generally corresponding to thenon-linear empirical industrial process to be modeled, the initial modelhaving a base non-linear function, an initial input and an initialoutput; construct a non-linear network model based on the initial model,the non-linear network model having multiple inputs based on the initialinput and a global behavior for the non-linear network model as a wholethat conforms generally to the initial output, the global behavior beingat least in regions of sparse initial input; and calibrate thenon-linear network model based on empirical inputs of the non-linearempirical industrial process by using a bound on an analyticalderivative of the base non-linear function that allows global propertiesincluding at least a global minimum value and a global maximum value ofthe analytical derivatives to be calculated directly from modelcoefficients, the global properties used to produce, via a constrainednonlinear optimization method, an analytically constrained model withglobal behavior, the constrained model enabling precision control of thenon-linear empirical industrial process.
 22. A computer-implementedmethod for building a model for modeling a polymer process, said methodcomprising the steps of: specifying a base non-linear function for aninitial model generally corresponding to the polymer process to bemodeled, the initial model including an initial input and an initialoutput and the base non-linear function including a log of a hyperboliccosine function; constructing a non-linear network model based on theinitial model and including the base non-linear function, the non-linearnetwork model having multiple inputs based on the initial input and aglobal behavior for the non-liner network model as a whole that conformsgenerally to the initial output; and calibrating the non-liner networkmodel based on empirical inputs of the polymer process by using a boundon a derivative of the base non-liner function to constrain parametersof the model in order to produce a constrained model with globalbehavior, the constrained model providing optimized approximations to aprocess controller for controlling the polymer process.
 23. A computerapparatus for building a model for modeling a polymer process;comprising: a model creator for specifying a base non-linear functionfor an initial model generally corresponding to the polymer process tobe modeled, the initial model including an initial input and an initialoutput and the base non-linear function including a log of a hyperboliccosine function; a model constructor coupled to the model creator forconstructing a non-linear network model based on the initial model andincluding the base non-linear function, the non-linear network modelhaving multiple inputs based on the initial input and a global behaviorfor the non-linear network model as a whole that conforms generally tothe initial output; and a calibrator coupled to the model constructorfor calibrating the non-linear network model based on empirical inputsof the polymer process by using a bound on a derivative of the basenon-linear function to constrain parameters of the model in order toproduce a constrained model with global behavior, the constrained modelproviding optimized approximations to a process controller forcontrolling the polymer process.
 24. A computer program product thatincludes a computer usable medium having computer program instructionsstored thereon for building a model for modeling a polymer process, suchthat the computer program instructions, when performed by a digitalprocessor, cause the digital processor to: specify a base non-linearfunction for an initial model generally corresponding to the polymerprocess to be modeled, the initial model including an initial input andan initial output and the base non-linear function including a log of ahyperbolic cosine function; construct a non-linear network model basedon the initial model and including the base non-linear function, thenon-linear network model having multiple inputs based on the initialinput and a global behavior for the non-linear network model as a wholethat conforms generally to the initial output; and calibrate thenon-linear network model based on empirical inputs of the polymerprocess by using a bounded derivative of the base non-linear function toconstrain the parameters of the model in order to produce a constrainedmodel with global behavior, the constrained model providing optimizedapproximations to a process controller for controlling the polymerprocess.
 25. A computer-implemented method for modeling a non-linearempirical industrial process, the method comprising the steps of:creating an initial model generally corresponding to the non-linearempirical industrial process to be modeled, the initial model having abase non-linear function, an initial input and an initial output;constructing a non-linear network model based on the initial model, thenon-linear network model having (a) multiple inputs based on the initialinput and (b) a global behavior for the non-linear network model as awhole that conforms generally to the initial output, The global behaviorbeing at least in regions of sparse initial input or in regions ofmissing initial input; and calibrating the non-linear network modelbased on empirical inputs of the non-linear empirical industrial processby using a bound on a derivative of the base non-linear function toconstrain parameters of the model to produce a constrained model withglobal behavior of the non-linear network model, the constrained modelenabling precision control of the non-linear empirical industrialprocess.
 26. A computer implemented method for modeling a non-linearempirical industrial process, said method comprising the steps of:creating an initial model generally corresponding to the non-linearempirical industrial process to be modeled, the initial model having abase non linear function, an initial input and an initial output;constructing a non-linear network model based on the initial model, thenon-linear network model having (a) multiple inputs based on the initialinput and (b) a global behavior for the non-linear network model as awhole that conforms generally to the initial output, the global behaviorbeing at least in regions of sparse initial input; and calibrating thenon-linear network model based on empirical inputs of the non-linearempirical industrial process by using a bound on an analyticalderivative of the base non-linear function that allows global propertiesincluding at least a global minimum value and a global maximum value ofthe analytical derivatives to be calculated and manipulated directlyfrom model coefficients, the global properties used to produce, via aconstrained nonlinear optimization method, an analytically constrainedmodel with global behavior, the constrained model enabling precisioncontrol of the non linear empirical industrial process.
 27. A computerimplemented method for modeling a non-linear empirical industrialprocess, said method comprising the steps of: creating an initial modelgenerally corresponding to the non-linear empirical industrial processto be modeled, the initial model having a base non linear function, aninitial input and an initial output; constructing a non-linear networkmodel based on the initial model, the non-linear network model having(a) multiple inputs based on the initial input and (b) a global behaviorfor the non-linear network model as a whole that conforms generally tothe initial output, The global behavior being at least in regions ofsparse initial input; and calibrating the non-linear network model basedon empirical inputs of the non-linear empirical industrial process byusing a bound on an analytical derivative of the base non-linearfunction that allows global properties including at least a globalminimum value and a global maximum value of the analytical derivativesto be calculated and manipulated directly from model coefficients, theglobal properties used to produce, via a constrained nonlinearoptimization method, an analytically constrained model with globalbehavior, the constrained model enabling precision control of thenon-linear empirical industrial process, and the model coefficientsbeing manipulated by using a modified base non-linear function.
 28. Acomputer implemented method for modeling a non-linear empiricalindustrial process, said method comprising the steps of: creating aninitial model generally corresponding to the non-linear empiricalindustrial process to be modeled, the initial model having a base nonlinear function, an initial input and an initial output; constructing anon-linear network model based on the initial model, the non-linearnetwork model having (a) multiple inputs based on the initial input and(b) a global behavior for the non-linear network model as a whole thatconforms generally to the initial output, the global behavior being atleast in regions of sparse initial input; and calibrating the non-linearnetwork model based on empirical inputs of the non-linear empiricalindustrial process by using a bound on an analytical derivative of thebase non-linear function that allows global properties including atleast a global minimum value and a global maximum value of theanalytical derivatives to be calculated and manipulated directly frommodel coefficients, the global properties used to produce, via aconstrained nonlinear optimization method, an analytically constrainedmodel with global behavior, the constrained model enabling precisioncontrol of the non-linear empirical industrial process, and the modelcoefficients being manipulated by using a modified base non-linearfunction that excludes at least one of a hyperbolic tangent function, aradial basis function, and a sigmoid function, the base non-linearfunction has a global minimum or a global maximum first derivative thatis independent of the model coefficients.
 29. A computer implementedmethod for modeling a non-liner empirical industrial process, saidmethod comprising the steps of: creating an initial model generallycorresponding to the non-linear empirical industrial process to bemodeled, the initial model having a base non liner function, an initialinput and an initial output; constructing a non-linear network modelbased on the initial model, the non-linear network model having (a)multiple inputs based on the initial input and (b) a global behavior forthe non-linear network model as a whole that conforms generally to theinitial output, the global behavior being at least in regions of sparseinitial input; and calibrating the non-linear network model based onempirical inputs of the non-liner empirical industrial process by usinga bound on an analytical derivative of the base non-linear function thatallows global properties including at least a global minimum value and aglobal maximum value of the analytical derivatives to be calculated andmanipulated directly from model coefficients, the global properties usedto produce, via a constrained nonlinear optimization method, ananalytically constrained model with global behavior, the constrainedmodel enabling precision control of the non-linear empirical industrialprocess, the global maximum and minimum values of the analyticalderivatives both being a free function of the model coefficients. 30.The computer implemented method of claim 29, wherein the base nonlinearfunction excludes at least one of a hyperbolic tangent function, aradial basis function, a sigmoid function, and wherein a global minimumor a global maximum first derivative is independent of the modelcoefficients.